DUDLEY KNOX LIBRARY 
NAVAL POSI-.AUUATr, 
1'ONTEREY. CvLIr r 



SC'KOCi. 

r\ 93 c m 



NAVAL POSTGRADUATE SCHOOL 

Monterey, California 




THESIS 



BENCHMARKING THE SELECTION AND PROJECTION 
OPERATIONS, AND ORDERING CAPABILITIES 
OF RELATIONAL DATABASE MACHINES 

by 

Robert A. Bogdanowicz 
September 1983 

Thesis Advisor: David K. Hsiao 



Approved for public release; distribution unlimited 



T214369 



SECURITY CLASSIFICATION OF THIS PAGE fWhM Date Entered) 



REPORT DOCUMENTATION PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


1. REPORT NUMBER 


2. GOVT ACCESSION NO. 


3. RECIPIENT’S CATALOG NUMBER 


4. TITLE (mnd SuMtlm) 

Benchmarking the Selection and Projection 
Operations, and Ordering Capabilities of 
Relational Database Machines 


5. TYPE OF REPORT 4 PERIOO COVERED 

Master's Thesis 
September 1983 


6. PERFORMING ORG. REPORT NUMBER 


7. AUTHOAf.J 

Robert A. Bogdanowicz 


3. CONTRACT OR GRANT NUM8ERf») 


>. PERFORMING ORGANIZATION NAME ano address 

Naval Postgraduate School 
Monterey, California 93940 


10. PROGRAM ELEMENT, PROJECT, TASK 
AREA 4 WORK UNIT NUMBERS 


11. CONTROLLING OFFICE NAME AND AOORESS 

Naval Postgraduate School 
Monterey, California 93940 


12. REPORT OATE 

September 1983 


13. NUMBER OF PAGES 
66 


»4. MONITORING AGENCY NAME 4 AOORESS (It d! iterant from Controlling Office) 


15. SECURITY CLASS, (of thia report) 

UNCLASSIFIED 


15a. 0 EC L ASSIFI CATION/ DOWNGRADING 

SCH EDULE 



If. DISTRIBUTION STATEMENT (ol thla Raport) 



Approved for public release; distribution unlimited 



17. DISTRIBUTION STATEMENT (of tha ebetrect entered In Stock 20, It different from Report) 



IS. SUPPLEMENTARY NOTES 



19. KEY WOROS (Continue on reveree aide It neceaeery and Identity by block number) 

benchmarking, relational database machines, database machines 



20. ABSTRACT (Conttnuo on reveree aide It neceeeery and Identity by block number) g thesis deSCribeS ttl€ 

performance-measurement experiments designed for a number of back- 
end, relational database machine configurations. An in-depth study 
of the tests and results of the two relational operations, namely, 
selection and projection, on a specific configuration is presented. 
In addition, tests are made on the ordering capabilities and per- 
formance of the machine configuration. The goal of the work is to 
lead to a development for a machine-independent methodology for 
benchmarking the selection and projection operations and on order- 
inq capabilities of database machines. 

DO , JAM *71 1473 EDITION OF 1 HOV •# IS OBSOLETE 

S/N 0102- IF* 014- 6601 



J SECURITY CL ASSIFI CATION OF THIS PAGE (Whan Data Kntorac 



approved for public release; distribution unlimited. 



Benchmarking the Selection and Projection Operations, 
and Ordering capabilities of Relational Database Machines 



by 



Robert A. ^cgdanowicz 
Lieutenant, United States Navy 
E.S., Illinois Institute of Technology, 1977 



Submitted in partial fulfillment of the 
requirements for the degree of 



MASTER OF SCIENCE IN COMPUTER SCIENCE 



from the 

NAVAL POSTGRADUATE SCHOOL 
September 1983 



ABSTBACT 



This thesis describes the perforraa nee -measure mer.t experi- 
ments designed for a number of backend, relational database 
machine configurations. An in-depth study of the -eats and 
results cf the t to relational operations, namely, selecr ion 
and projection, on a specific configuration is presented. 
In a d dit ion , ts sts are made on the ordering capabilities and 
performance cf the machine configuration. The goal of the 
work is tc lead to a development for a machine-independent 
methodology for benchmarking the selection and projection 
operations and on ordering capabilities of database 
machines. 



3 



TABLE OF CONTENTS 



DUDLEY KH07 LIBRARY 



NA 

K 



r, 'JATC SCHOOL 
I. C .Lii* OkHIA 935*4 



I. INTRODUCTION 8 

A. BENCHMARKING DATABASE MACHINES 3 

1. A Definition 3 

2. Database Machine Benchmarks 9 

3. Objectives 10 

B. THE BENCHMARKING ENVIRONMENT 11 

1. The Host 12 

2. The Host Interface 12 

3. Machine Configurations 13 

C. THE BENCHMARKED MACHINE 14 

1. Modular Design 14 

2. Technology and Functionality of Modules . 15 

II. THE DATABASE 18 

A. THE USE OF SYNTHETIC DATA 18 

3. GENERATION OF THE SYNTHESIZED DATA 19 

III. THE QUERY LANGUAGE 23 

A. SYNTAX AND SEMANTICS . . . 24 

B. TEST QUERIES 25 

1. Timing Considerations 26 

2. Objectives 27 

TV. PERFORMANCE EVALUATION OF THE SELECTION OPERATION 23 

A. DEFINITION OF A SELECTION 28 

3. SELECTIONS IN THE QUERY LANGUAGE 28 

C. AN ENVIRONMENT FOR THE MEASUREMENTS 29 

D. SELECTION MEASUREMENTS 30 

1. The Percentage of Selection 30 

2. Effects of Clustered and Non-Clust ered 

Indiciss 33 



4 



3 



Effects of Data Compression on Selection 

Queries 3 5 

4. Effects of Ordering and Randomizing the 

Database Entries 3 9 

E. CONCLUSIONS 42 

V. PERFORMANCE EVALUATION OF PROJECTION OPERATION . . 45 

A. DEFINITION OF A PROJECTION 45 

B. PROJECTIONS IN THE QUERY LANGUAGE 45 

C. AN ENVIRONMENT FOR THE MEASUREMENTS 46 

D. PROJECTION MEASUREMENTS 47 

1. Percentage of Projections on Non- Key 

Attributes 47 

2. Comparison of the Equivalent Queries cn 

Selection 52 

E. CONCLUSIONS 59 

VI. CONCLUDING REMARKS 61 

A. OVERALL OBSERVATIONS OF THE MACHINE 

PERFORMANCE 61 

3. DATAEASE AND MACHINE LIMITATIONS 62 

C. RECOMMENDATIONS FOR FUTURE BENCHMARKING 

EFFORTS 6 3 

LIST OF REFERENCES 65 

INITIAL DISTRIBUTION LIST 66 



5 



LIST OF FIGURES 



1.1 The RDM-1100 Relational Database Computer ... 15 

2. 1 Tuple T emplates 21 

4.1 Simple Selects with no Indicies 31 

4.2 5%-SeIects with Mon-Clustered Index on P5 and 

P10 32 

4.3 Ordered Retrieves with Indicies on KEY 34 

4.4 Retrieves with and without Indicies 35 

4.5 Speed Improvement using Non -Clustered Index on 

F5 and P10 3 7 

4.6 Effects of Data Compression 33 

4.7 Effects of Ordering on the Response Time .... 40 

4.8 A Comparison of Host's and Backend's Ordering 

Capability 41 

4.9 Effects of Ordering on Ordered and Random 

Relations 43 

5.1 25? Projections on 5? Selections 43 

5. 2 25? Projections on 10? Selections 49 

5. 3 50? Projections on 5 % Selections 50 

5.4 75? Protections on 5% Selections 51 

5.5 Effects of Differing Projection Percentages for 

200 -byte Tuples 53 

5.6 Effects of Differing Prcjectan Percentages for 

10 00- byte Tuples 54 

5.7 Comparison cf Projection and Selection for 

100-byte Tuples 55 

5.8 Comparison cf Projection and Selection for 

200-byte Tuples 56 

5.9 Comparison cf Projection and Selection for 

1000-byte Tuples 57 

5.10 Comparison cf Projection and Selection for 

2000-byte Tuples 58 



6 



AC KNOWLEDGEMENT 



<v 


he 


exp 


c riment 


s 


dsscr: 


.b 


ed 


in 


thi 


s 


thesis 


are 


the 


r a 


suit 


5 


of th 


£ 


cc cr 


dina ted 


o 


f forts 


of 


man 


y i 


rt 


di vidual 


o • 


For 


emo 


st 0 


f 


these 




ndiv 


i duals 


i 


s cer x 


:a 


in 


lv M 


s. 




Paula S 


ara w 


s fr 


o 


f Th 


o 


Oh i c 


s 


ta te 


Uni ver 


si 


ty and 


■u 


he 


Na v 


a 


1 Pcstgr 


adaa 




Sc 


hool 


• 


Ms . 


Sh 


raws 


sr's dil 


igence , 




dedi 


cat 


i 


on, and 


gu 


i da 


nee 


hav 


e 


prove 


n 


inva 


1 ua b 1 e 


i n 


bet? 




th 


e re 


sea 


r 


oh prior 


t c 


a r 


d s 


ubse 


- 


quer.t 


p 


repa 


ration 


of 


thi s 


u 


he 


sis . 


















L 


ik 


ewis 


e, Ms. 




Doris 


M 


le 


czko 


an 


d 


her st 


af f 


a r 


rh- 


Dar 


9 



Processing Service Center West at Point Mugu, California, 
have provided much assistance, and they have proven flexible 
enough no accomodate our sometimes inflexible requirements. 
Gratitude is also expressed to Hr. Ben Torres and his staff 
at the Computer Operations Center at Point Mugu. 

Finally, to Commander Tom Pigoski of the Naval Security 
Group Command is offered a SDecial thanks for his continued 
support in providing the necessary assistance both to keep 
this project going and to enable the results of this 
research to be presented at the International Workshop on 
Database Machines in Munich, September, 1933. 



7 



I. introduction 
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of the computer system perf oraanc?. 

1 . A Definition 

Eenchmarkina is a term used throughout ths industry 
in a myriad of differing contexts. In each case -he ulti- 
mate goal is to make ar. independent measure cr reinvent 
comparison of machine capabilities. These comparisons or 
measures could be anything from the throughput to the speed 
cf calculations by a certain internal component, but in the 
final analysis seme measure or evaluation of performance is 
desired . 
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There are many different ways of evaluating machine 
performance. Many manufacturers provide the capability of 
attaching monitoring systems ~o their equipment. Th a se may 
be either hardware monitors, which physically sense the 
action occuring in the system and icsep statistical records, 
or they may be software monitors which attempt to perform 
the same function with software hooks that keep track of the 
system operation and give the operator a statistical 
analysis of the machine action and performance. Software 
monitors have the disadvantage of using a good deal of the 
system time just for their own operation. Though hardware- 
monitors do not suffer from this disadvantage, they require 
the wiring of the monitor system into the hardware. The 
biggest disadvantage to these types of measurements, 

however, is the inability to make comparisons on differing 
machine configurations and between different manufacturers. 
Benchmarks attempt to solve this problem by forming some 
standardized testing methodology that is easily transpor- 
table from one machine to another machine. Mos^ 

importantly, the measurements made must be r °1 e v - n t regard- 
less of the machines benchmarked and give an accurate means 
cf comparison between these machines. 

Therefore, benchmarks are defined to be certain sets 
cf instructions that will test all the capabilities cf a 
machine and yield seme generic set of data that will give an 
accurate measure of that machine in its tested configura- 
tion. This data will then give the observer specific 
guidelines for making relsvent and general comparisons with 
similar machines and configurations. 

2. Database Mac hine Benchmarks 

With the advent of special-purpose database machines 
and backend database machines, a new field of application 
for benchmarks exists. Previously, benchmark routines nave 
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been used exclusively for the testing and performance evalu- 
ation of large general-purpose mainframes. With the proli- 
feration of backend processors to unload specialized tasks 
from the mainframe, these benchmarks have been ineffective, 
because the computer system's capabilities of performing the 
specialized tasks are net benchmarked. Our primary concern 
is with the benchmarking of specialized backends known as 
database machines. In this context we mean a specialized 
processor externally linked to a mainframe, with its own 
special-purpose hardware and software for database manage- 
ment. Eackend refers to this externa lly- linked and 
specially-built machine. 

3 • Ob j ect ive s 

At present the backend database machine is in its 
infancy in the ccmmercial marketplace. Nevertheless, the 
database system is extensively utilized in various forms and 
for different tasks, exclusively in some software configura- 
tion operating on a large general-purpose machine. In erder 
to provide effective database functions the so ft ware- laden 
database system consumes a great deal of the mainframe's 
resources which severly limits the usefulness of the 
mainframe for ether functions. 

This has started a trend towards the backend data- 
base machine, one that can reduce the time -he host spends 
in searching and updating data in response to user queries. 
This greatly increases the ultimate usefulness of the host, 
since these backend database machines are only a small frac- 
tion of the total system cost. The database machines now on 
the market have been implemented using microprocessor tech- 
nology rather than fully -special! zed hardware, thereby 
keeping their costs down. As the market expands ana more 
progress is made in VLSI technology, we can expect tc see 
more specialized hardware at even lower cost. 
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Car objective hers is to develop seme basic testing 
procedures tc benchmark relational database machines. This 
thesis also gives account of test results performed on a 
specific backend database machine, the RDM-1100, and its 
various configurations. If is limited to the results of 
test queries in the operations of selection and projection 
and ordering capabilities. In addition * 0 this thesis, 
there are three ether theses, [Refs. 4,5,6], which describe 
in detail the test procedures and results of join opera- 
tions, the generation of the databases used in the 
experiments, and the other test procedures and results. The 
ultimate goal of the entire project is tc develop and iden- 
tify some sets of gueries that can be used in evaluating 
database machine performance. 

B. TBE BENCHMARKING ENVIRONMENT 

Our primary emphasis is to evaluate the performance of 
the system/machine under typical operating conditions. In 
this sense a standardized workload model must be developed. 
This includes the use of typical user gueries (transactions) 
in addition to the design of a database. In terms of th<= 
database, we developed a paramater i zed database generator 
that will generate cur dataoases with attributes according 
to a specified format and with values from well-defined 
domains according to specific distributions . We chose this 
approach so that we could predict or interpret accurately 
the results of any given query. More details are given on 
the context and design cf the database in Chapter II. 

Query streams are developed to test the full range of 
possible user operations. All queries are in forms of 
selection, projection, cr join operations as may be made by 
a typical user. The actual query syntax and selection cf 
query streams is discussed further in Chapter III. 
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In addition, the environment available to us for the 
test runs is very restricted. There are no hardware or 
software probes available at the time of testing, nor any 
statistical information on the backend machine. Our only 

recourse is to use a built-in retrieve function that will 

give a readout of th<= database machine clock. 
Unfortunately, the clock has a low resolution, 1/60 of a 

second. A system call is executed to retrieve the time 

before and after each test query, thereby providing a crude 
yet consistent time measure. 

1 . The Host 

The actual testing is dene using a UNIVAC 1100/42 
host system. The system is located at the Pacific Missile 

Test Center, Point Mugu, California. The basic database 
machine used is the RDM- 1100, which is a Britton-Lee IDM-500 
modified to run as a backend to UNIVAC 1100 computers by the 
Amperif Cert. of Chatsworth, California. 

The testing is done using run-stream queries in an 
interactive environment. These queries are run either on 
site at Ft. Mugu, cr from a remote terminal set up at the 
Naval Postgraduate School, Monterey. He prefer to run the 
test gurries in a stand-alone, single-user mode in order to 
minimize the effects of workload variability of the host 
machine. in the event that the queries are not run stand- 
alone, the number of coincidental users is very low and 
little or no difference is observed in the measurement from 
one run tc another. 

2. The Host Interface 

The interface between Univac and the RDM is via a 
word channel; the ECM is treated as an I/O device by the 
UNIVAC mainframe. Ihe standard IDM device is capable of 
communicating over an HS-232 serial interface or an IEEE-488 
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parallel interface. The communication board of the IDK at 
Pz. Hugu has been modified to be compatible with the Univac 
system. It supports byt e/word channel interface with a 200K 
byte/seccnd capacity. 

Ihe driver routines on the Univac host handle the 
parsing of the user queries, and translate them into ‘•.he IDM 
internal format. The host also handles the communication 
protocol with the backend machine. The backend, in addition 
to performing the necessary handshakes, will perform the 
required error checks and cause the host to retransmit in 
the event that an error is detected. 

3. Machine Con f i gur at i c ns 

The IDM-500 system comes with different amounts of 
internal cache memory, and has an optional accelerator 
board. The accelerator is a high-speed processor designed 
to perform certain common relational functions in order to 
increase the overall system performance. The machine can be 
configured to hold 1-6 megabytes of information. We have 
run tests on the following configurations: 

(1) 1/2-megabyte cache without accelerator; 

(2) 2-megabyte cache with accelerator; 

(3) 2-megabyte cache without accelerator. 

The first of these configurations is no longer marketed. 
The standard package contains 1-meganyte of cache memory and 
no accelerator. In addition, the machine used in our -ests 
is linked exclusively to the Univac 1100, and is equipped 
with only one disk controller, with access to two 
600-megabyte disks. 
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C. THE EENCHM AR KED MACHINE 

We chess to restrict our work to the IDM-500, a rela- 
tional database machine. This type of machine is relatively 
new on the database market. Although it is no* clear that 
it will be the predominant database machine architecture, 
the latest literature and current trends appear to indicate 
that it may play an important role, at least in the short 
run . 

The relational model is intuitively easier tc use and 
understand -ban ether database models, and it appears that 
it will significantly contribute to lower software develop- 
ment costs. Nevertheless, fully-implemented sc f tw ar- 
tel at ion al database management systems have severe perfor- 
mance problems. The high cost of performing relational 
operations, most strikingly the join and projection 
operations, underlies the problem. 

With the great interest in the relational database 
models and the advances in technology that permit the use of 
special-purpose processors and backend systems tc perform 
the majority of work, we feel that the relational database 
machine will play an important role in the database manage- 
ment market. The Eritton-Lee IDM-500 is one of the first 
machines to take advantage of this technology and incorpo- 
rate it into a relational database system which can be used 
as a backend tc a variety of mainframes. 

1 . Modular Des ign 

The 3ritton-I.ee IDM-500 is a backend relational 
database machine that can be linked to one or more host 
computers. Amperif Corp. markets this system under an OEM 
agreement as the RDM-1100. Essentially, the system is a 
Erittcr.-Lee IDM-500 with Amperif providing the host and 
backend interface software tc communicate with the Onivac 
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1100 and a host -int erface module. Figure 1.1 depicts the 
architecture of the Eritton-Lee machine. From new on we 
will use IDM-500 and RDM-1100 interchangeably. 

The backend is a modular, expandable, 

microprocessor-based system organized around a central high 
speed bus. Each module is functionally oriented. 

2 • lichnolccj^ and F ur.c ticnality of Modu les 

The EDM-1100 is made up of six basic modules organ- 
ized on a central high speed bus ( see Figure 1.1 again ) . 
The modules perform the following functions: 

a. The database processor 

The database processor, a Z8000-based micropro- 
cessor, supervises and manages all system resources. This 
processor executes most of the software in the system. 

b. The database accelerator 

The database accelerator (an optional processor) 
is a high-speed processor with an instruction set specifi- 
cally designed to perform and optimize certain functions. 
It is activated by the database processor as appropriate. 
The accelerator has a three-stage pipeline which executes 
instructions at up tc 10 MIPS. This processor can initiate 
disk activity and process data at disk transfer rates. The 
accelerator and ■‘■he RDM software are so configured that the 
majority of database work is performed by the accelerator 
under the direction of the database processor. 

c. The main memory 

The RDM main memory, or cache memory, is 
composed of 64k-bit dynamic RAM chips. The RDM car be 
configured with from 1-megabyte to 6-megabytes of memory. 
This memory is utilized for RDM system code, disk buffering, 
indices, and user commands. 
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Figure 1.1 The RDM-1100 Relational Database Computer. 



a 



The internal bus 



The entire system uses a common internal bus 
system for inter -pro cess or communication and data transfer. 

e. The disk/tape interfaces 

The system can be configured with up to 4 disk 
controller modules. Each controller can manage from one to 
four disk drives. The disk controller moves data between 
external disks and the RDM main meraerv. The disk controller 
is designed to work with the accelerator which can process 
data at disk transfer rates. An optional taps control 
module supports up to eight tape drives, which can be used 
for direct disk-to-tape backup, data loading, and RDM 
software leading. 

f. The host interface 



The RDM and the hes- (s) communicate via the host 
interface module. This module accepts commands from one or 
more hosts, performs error checking, causes 1 h:- host tc 
retransmit if an error is detected, and informs the database 
processor that it is moving a command into the cache. Each 
host interface module can handle up to eight hosts. Hence, 
with the full 8 interface modules, a maximum of 64 hosts can 
be accomodated by the RDM. The standard interface module 
supports both PS-232 serial interface or an IEEE-488 
parallel interface. 
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II . THE DATABASE 

In our benchmark measures on the RDM-1100, it is impor- 
tant to model the queries or transactions to be processed, 
and tc medal the database. The performance of any database 
system depends not only on the characteristics of the data- 
base system, but also on the size and s-ructure of the 
database. Considering this two-dimensional problem, we wan” 
to build databases where the values for each attribute may 
be selected from well-defined domains. In addition, we feel 
that these values should have specified and well-formed 
distributions to aid in the prediction of -he response sa- 
fer any given query. 

We have built a parameterized relation generator, a 
software system to generate relations for synthetic data- 
bases. These synthetic databases are then used by our query 
stream tc simulate the activity of actual users on the 
sytem. Several of these databases are built, varying the 
tuple widths as well as the number of tuples per relation. 
We then attempt to distribute the databases on the disks tc 
force specific actions on the processor, such as join opera- 
tions between relations on the same or seperate disks. In 
this manner we seek tc find any significant difference due 
to the distribution and location of the data on disks. 

A. THE USE OP SYNTHETIC DATA 

As with any system model, it is important that the 
synthetic data adequately represent the essential character- 
istics cf real databases. 3y utilizing the synthetic 
database, we can represent a subset of the real-world data- 
base and save time and space for not accommodating the full 
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set cf the real-world database. However, the organize- ior. 
is general enough to provide an emulation of the real world. 
The synthetic databases we have designed includ the basic 
data types that would exist in a real-world database: 
integer, character, and so on. For attribute values we have 
incorporated both sequential and random orders, as well as 
groupings according to specific discrete distributions. 
These are more fully described in the next section. Using 
this format we can net only accurately predict the outcome, 
i.e. amounts of data returned by a query, but we can also 
easily reproduce the databases on other systems for further 
tests . 

B. GENERATION OF THE SYNTHESIZED DATA 

When designing the database, our first concern is with 
the physical sizes that should be used. The relations must 
be large enough to test the full capacity of the system, 
and meaningful enough to include various attributes. For 
example, we choose tuple widths cf 100, 200, 1000, and 2000 
bytes with the maximum tuple width being limited at 2000 
bytes and the disk access being performed in 2k blocks. 

Our seccnd consideration is how large the relations 
should be, i.e., how many tuples per relation. Again, in. 
order to test the system for both large and small relations, 
we decide or. relations with 500, 1000 , 2500, 5000, or 10000 
tuples. These are arbitrary decisions. The relation sizes 
are multiples of the smallest number in order to facilitate 
comparisons cf the test results. 

Our n e xt consideration is the actual design and building 
cf the data generation tool. We envision a great many data- 
bases with differing configurations. Thus, an interactive 
interface tc a generation program appears to be the most 
effective approach. Using the locally available IBM 3033 
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integer can be assured. The integer can be either cr.?, *wo, 
cr four bytes. Character-strings car. also be chosen, either 
compressed cr uncompressed," in a collating sequence or in 
some random order. Character string values can also be 
selected from enumerated domains either randomly or 

according to a specific discrete distribution. In our 
prototype the discrete distributions are limited tc multi- 
ples cf 5^. The user is also given the opportunity to set 
the naming convention for each relation and its attributes. 
The prototype is designed and implemented with a limited set 
of alternatives. It is however modular for adding alterna- 
tives to the prototype, such as exponential or normal 
distributions. 

We use a standard template for each tuple width. A 
pcrticn cf this template is standard for each relation ( see 
Figure 2.1 ). Each relation contains: a sequential-integer 
attribute, a 4-byt e-in tsger, ’key 1 ; a character-attribute 
'mirror', which is identical in numerical value to ’key' but 
stored as a character string and not as- an integer: a 

random- integer-attribute 'rand' of 4-byte integers; end a 
character-string-attribute 'chars', which contains 
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Figure 2. 1 Tuple Templates. 
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characters in a collating sequence. The number of charac- 
ters in •chars' is dependant on the tuple width, in order to 
ensure tha J- tuples are exactly -100, 2 00, 1000 , and 2000 

bytes wide. The length of 'chars' is set to the precis? 
number of characters required to ensure that the ~upie is of 
the proper width. The random field is present to aid in 
randomizing the order of the tuples and the purpose of the 
mirror field is to compare the performance of identical 
retrieve operations based on queries qualified on the 
sequent ia 1- inte ger-attribut e, ’ key * , and the character- 
attribute, 'mirror' . The 100-byte and 200-byte tuples also 
contain a sequential-unit-letter field of 1-byte character 
in collating sequence, 'letter', and a unique 

random-integer-attributa of 4-byte integers, 'uniqrand'. 

Each template is then filled out with attributes for 
which the values are chosen from a number of enumerated 
values. For example, the P10 attribute specifies attribute 
values with a uniform distribution over ten unique values. 
A retrieve statement with one qualifier could then be 
written tc retrieve 10"? of the tuples in the relation. The 
number of such fields is dependent on the tuple width. 

Once the design cf the databases is complete, multiple 
instances cf each relation are built using the interactive 
generation tool on the IBM 3033. The relations are then 
transferred to tape storage fcr transport to Pt. Kuqu and 
the UNIVAC 1100. The data is leaded ente the UNIVAC 1100 
disks and then loaded to the backend database machine using 
a bulk-load utility. 

Teste are planned on the basis of an assumed capability 
to ccntrcl the distribution of the data on the RDM 1100 
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block allocation for relations within the database is cont- 
rolled within the database machine, and is not predictable. 
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III. THE QUERY LANGUAGE 



The interaction ter ween the user ana t he RDM- 11 00 is 
through the software interface, RQL (relational query 
language) , provided by Amperif. The interface translates 
the user's RQL command into the backend-machine's internal 
format and sends the formatted command to the RDM- 1100. 
The software requirement for the host is minimal, and the 
backend machine is independent of the host. 
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runs in a stand-alone, single user environment to minimize 
*he host workload variability, we are forced to execute cur 
run streams during the evenings and on weekends. In addi- 
tion we want to run sets of tests over several system 
configurations. This again reduces the overall time for us 
to run cur performance i «sts on each configuration. 

Additional constraints are imposed by the nature of the 
interface software provided by Amperif and by the configura- 
tion of the machine at Pt. Mugu. Pre-compilat icn of the 

queries is not supported. Me therefore have chosen to use 
the s* ored-co ramands facility of the backend machine to 
reduce varability in the parsing time. The stored- commands 
facility allows the user to store the parse-trees produced 
by the interpreter as named commands in a relation in the 
user's database. When these stored commands are invoked at 
a later time, the parsing is reduced to a minimum. Using 
the stored-command facility also eliminates the time 

required to look up target-list and qualification attributes 
in the data dictionary. 
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A. SYNTAX AND SEMANTICS 



The basic operations involved in retrieving data in a 
relational system are selection, projection and ioin. This 
section will provide a basic overview of the syntax of the 
Relational Query Language (SQL) , with pertinent examples. 
For a more detailed explanation of the language as well as 
the database admin st rater functions, please refer to 
[Ref. 5]. This thesis focuses exclusively or. the seiec~ion 
and projection operations. The interested reader is encour- 
aged to read [Ref. 4], for an explanation and evaluation of 
the join operations as performed on the RDM- 1100 and its 
various configurations. 

Simple selection in SQL is expressed as follows: 

RETRIEVE ( A. ALL ) WHERE A. CITY = "CHICAGO" 

The keyword to the selection operation is RETRIEVE. The 
relation referred to in this case is A ar.d the qualifier ALL 
indicates that all attribute values, i.e. the entire tuple, 
are to be returned for each qualifying tuple. In this 
example an optional Qualifier consisting of a single predi- 
cate has been added, WHERE A. CITY = "CHICAGO". This 
qualifier restricts the tuples returned to only those tuples 
in which the city attribute has a value of "CHICAGO". The 
qualifier cculd have multiple predicates, related by any of 
the boolean operators, such as AND, OR, = (EQUAL), != (HOT 
EQUAL), etc. An example is: 

RETRIEVE (A. ALL) WHERE A .Cl TY = "CHICAGO" OR A. CIT Y= " MONTEREY” 
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In this case the backend machine will return all the tuples 
in the relation A in which the city attribute has either the 
value "CHICAGO" or the value "MONTEREY". 

The selection operation restricts the tuples to be 
returned. The projection operation restricts the attribute 
values cf a tuple; only a portion of the attribute values of 
each tuple are returned. For example; 



RETRIEVE (A. CITY, A. NAME) 

In this case, the target list (A . CITY , A . NAME) , specifies the 
attribute values tc be projected out of the tuple and 
returned to the user. Only the values of attributes CITY 
and NAME for each of the tuples in the relation A will be 
returned. A qualifier (r.ot shewn) could be added as in a 
previous example to limit the number of tuples returned to a 
specific subset of the relation. 

Commands like these make up the bulk cf the queries used 
in the selection and projection tests, with varying quali- 
fiers attached. RQL has many more capabilities, such as the 
aggregate functions and the EY clause. For further details, 
again refer to [Ref. 5], 

E. TEST QUERIES 

The test queries used are all selection and projection 
operations in the form of the previous two examples. 
Qualifications are used on these queries to select given 
percentages of the attribute values, as well as given 
percentages of the tuples in each relation. As described in 
Chapter II, single qualifiers are used on the attribute 
values having discrete distributions to select only a given 
percentage cf each relation. Comparisons are made on the 
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as the percentage cf 



tack end database machine's performance 
data retrieved is varied. This variation ccvc-rs two dimen- 
sions: the percentage of tuples in a relation and the 

percentage cf attribute values in a tuple. Additional 
testing is dene on single-tuple retrieves and queries using 
range predicates on the key field. Each of these experi- 
ments is described in further detail in Chapers IV and V 
along with a detailed description of the commands used tc 
retrieve the data. 

1 • liming Consi de rati o ns 



As mentioned before, the most critical restriction 
placed on the performance tests is the lack of measurement 
tools. There are no monitors available tc keep track of CPU 
or I/C activities in the backend database machine. The only 
available measurement capability is a measurement of elapsed 
time that cculd be extracted from the backerd database 
machine clock, which has a resolution of 1/60th cf a second. 
Our prime concern in this performance evaluation is to 
determine the effects cf varying certain parameters cr a 
backend database machine and gather some gross overall 
measures. In this sense, therefore, we feel tha~ the rough 
measurements afforded by the backend machine are still 
acceptable fer our purpose. 

In erder to determine the elapsed time in processing 
a query, a retrieve command tc extract the time from the 
backend database machine clock is executed before ar.d after 
each guery. The retrieve command is of the form: 



RETRIEVE ( TIME = GETTIME () ) GO 



GETTIME is a system function of the backend machine. This 
command is used to print a time, in 1/60 second increments. 
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before and after our queries. Using this throughout cur 
experiments we can get gross, yet consistent measurements of 
total time required to execute the queries. Even with this 
poor resolution, the comparison of identical queries will 
yield relevant performance comparisons of the response time 
of the backend machine. 

2 • Objectives 



The final objective of these tests is not to 
generate large volumes of data with figures cf retrieval 
times for particular gueries. Our primary goal is tc make 
relevent comparisons of the machine performance as the 
gueries are varied inside specific parameters. To this end 
we hope to make some judgements of the overall performance 
cf this particular backend database machine, but more impor- 
tantly tc gain some insight into the testing methodology for 
backend database machines in general. In the next chapters, 
examples of the run-streams used in the experiments are 
given along with graphical representations cf the test 
results . 
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IV. PERFORMANCE EVALUATION OF THE SELECTION OPERATION 



A. DEFINITION OF A SELECTION 

Selection is a means for the user to retrieve and 
examine pertinent information from a relation. The user may 
select the entire relation or he may restrict the informa- 
tion returned to him in two ways. He may limit the number 
of tuples returned fcy adding a qualification to the selec- 
tion operation. The qualification will limit the tuples 
retrieved to those whose attribute values satisfy the condi- 
tions of the qualification. Qualification consists of 
predicates, assertions or. the attribute values of the tuple 
cr tuples. Multiple predicates may be combined with boolean 
operators, such as AND, OR, EQUAL, NOT EQUAL, etc. The user 
may alsc restrict the attribute values returned by expli- 
citly listing those attributes which he desires, a 
projection of the relation. This is further described in 
the following sections of this chapter. 



B. SELECTIONS IN THE QUERY LANGUAGE 

In EQL the user is given ccnsidsraole pew er of selection 
through use of the RETRIEVE command. Using the 100-byte 
relation described in Table 2.1 as a format for a relation 
A, a typical RQL selection command might be: 



RETRIEVE < A. ALL ) WHERE A. KEY = 25 



In this command the keyword RETRIEVE is used to signify 
selection, the A. ALL indicates that all attribute values 
i.e., entire tuples, are to be returned, and the keyword 



28 



WHERE identifies the quantifier. The A. ALL may be replaced 
with an explicit listing of those attributes desired. The 
attributes may be listed in any order the user desires. 
Osing the key word WHERE and a qualification, the user may 
then indicate which cf the tuples are to be returned. In 
this example, only those in which the KEY field is equal tc 
25 are returned. The user may use other operators such as < 
or >, and is given the option to use more than one predi- 
cate. For example: 

RETRIEVE ( A. ALL) WHERE A. KEY > 25 AND A. KEY < 100 

would return all tuples with the KEY field in the ratae 26 
through 99. The user is given great latitude in delimiting 
the subset cf the relation he desires. For more detailed 
information concerning the capabilities and syntax, rhe 
reader is encouraged to read [ Ref . 5]. 

C. AN ENVIRONMENT FOR THE MEASUREMENTS 

Ihe resulrs discussed in this chapter are from tests 
performed or the system configuration with 2-megabyte cache 
memory and rhe optional accelerator. Lack of time prever.red 
a significant number of tests on alrernate configurations 
for comparison. However, these tests can be conducted on 
ether configurations without modifications. 

As described in Chapter III, the riming measurements are 
the backend system's response to a retrieve for its internal 
system clock time in 1/60-second resolution. In most cases 
the measurements are based on single queries due to the time 
involved. Some measurements are averages over several query 
responses; these are dif f erentiarsd in the sections which 
follow. In all cases the rests are runs performed in the 
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evenings and weekends with virtually no other users or. the 
sytem . 

D. SELECTICN MEASUREMENTS 

The figures in the first section represent results gath- 
ered for selections with and without indicies. The number 



of tuples 


returned 


is restric 


ted to a fixed 


proportion 


the total 


number of 


tuples in 


the relation; no 


projection 


involved . 


The final 


sec tions 


give comparisons 


of the sys 



ordering capabilities on the fro n tend as well as the 
backend, and the effects of data compression. 

1 • The Percentage of S elect ion 

Figures 4. 1 and 4.2 show the system response time 
for selection. Figure 4.1 shows measurements on a database 
with no indicies; Figure 4.2 shows measurements on a data- 



base with 


a non-cluster 


ed index on the 


P5 


and 


PI 0 


attributes . 


As described 


in Chapter II, 


the P5 


and 


PI 0 


attributes 


are attributes 


whose values are 


in a 


unif 


or m 



distribution over the corresponding percentage. The P5 
attribute values will be 20 unique values each appearing in 
5% of the tuples and the P10 values are 10 unique values 
each appearing in 10^ of the tuples. The queries used are 
qualified cn the P5 attribute. Therefore, for each query 
the system will return exactly 5 % of the tuples in the 
relation . 

As evident in Figure 4.1 the system response time 
increases nearly linearly as the amount of data returned 
increases. As expected, the larger is the tuple size; the 
steeper is the slope, since the volume of the data increases 
more rapidly for the larger tuple size. 
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Figure 4. 1 Simple Selects with no Indicies 
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Figure 4.2 5%-Selects with Nor-Clustered Index on P5 and P10. 



Figure 4.2 shews the results of the same queries rur. 
against a database with indicies on the P5 and P10 at~ri- 
butes. Cc sparing Figure 4.2 and 4.1, we notice that the 

overall times are greatly reduced. The graph still shews 
nearly linear relationship of the increasing response tiro 3 
ar.d of the increasing volume of data. Further discussions 
of the effects of indicies follow in the next section. 

The linearity of the response time appears to indi- 
cate that t he system performance is bound by the spe=d of 
the channel between the host and the backend. The larger 
the volume of data is to be returned; the longer the channel 
will be active in order to transfer the data. 

2 . Ef fect s of Cl nst at e d and Non -Cl ust ered Indicies 

The RD N- 110 0 supports two types of indicies, clus- 
tered and ncr.-c 1 uste red. Creating a clustered index causes 
the tuples to be ordered by KEY for storage. A sparse index 
containing one entry per block is built. A non -clustered 
index, on the other hand, contains a unique entry for each 
tuple in the relation. No ordering of tuples within th c 
relation is implied. 

Figure 4.3 shows response times for the retrieval 
query with r.c qualification, but with an ordering specifica- 
tion. The queries are of the form: 

RETRIEVE (A. ALL) ORDER 3Y A. KEY 

where A is the relation name and KEY is an attribute in A. 
In an ordered retrieve, the tuples are sorted in the backend 
machine and then sent to the host for display. Similar 
queries are run against a relation with no index, a relation 
with a r.cn-clustered index on the KEY attribute, and a rela- 
tion with a clustered index on the KEY attribute. The 
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Figure 4.3 Ordered Retrieves with Indicies on KEY 



response rimes are similar throughout the range of relation 
sizes. The indicies, clustered or non-clustered , provide no 
significant improvement for this range of relation sizes. 
The expecfed results would have shown a significant improve- 
ment for the relation with a clustered index. The 
similarity in response times may indicate that the EDM sorts 
the the tuples, even though the tuples have been in sorted 
order due to the use of a clustered index on the ordering 
attribute . 

Figure 4.4 shows the results of test runs on rela- 
tions with and without non-clustered indicies on the P5 and 
P10 attributes. The graph shows a significant improvement 
in response times fcr the relations with th<= non-clustered 
index. Locking at Figure 4.5, the improvement ratio is made 
more evident for simply qualified retrieves when the index 
is on the attributes used in the predicates cf the qualifi- 
cation. The larger is the tuple size; the greater becomes 
the improvement. The 200 -byte tuple shows a nearly 957, 
increase in the response time. The other tuple sizes show 
similar improvements. 

3 . E f f set s of Data Com press i on on Sele ctio n Qu eri es 

The backend database machine has the capabili-y of 
storing character strings in either compressed or uncom- 
pressed format. k character string in compressed format is 
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Figure 4.4 Fetrieves with and without Indicies 
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Figure 4.5 Speed Improvement using Non-Clusters 
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Figure 4.6 Effects of Data Compression 



Here specifically. Figure 4.6 shows th** results of 
the tes~s for the relations of 100-byre tuple size and she 
2000-byte tuple size, respectively. For the 100-byts tuple 
the storage requirement is reduced by approximately 50 ^ when 
all attributes are fully compressed. In the case of '■he 
2000-byte tuple size, the savings in storage is 
approximately 90%. 

The graph shews a major improvement in the response 
time for compressed relations. From the steep slope of the 
line it appears evident that the greatest impact on system 
speed is the amount of data that must pass over the internal 
bus. The large reductions ir. tuple size for the compressed 
relation shews a clear advantage over the uncompressed rela- 
tion. Th a delay becomes increasingly significant for 
relations of larger tuple sizes. Approximately, a delay 
factor of 10 for the larger tuple size and 10000-tuple rela- 
tion is observable. 

4 . E ffect s of Ordering and Rando mizin g the Dat abas e 

E r.t r ie s 

Figure 4.7 shows the results of tests be measure the 
backend system’s sorting capabilities. The relations used 
are stored in the backend; their tuples are ordered on their 
KEY attributes. The graph depicts retrieves with and 
without ordering specifications on the KEY attribute. There 
is a slight increase in the response time for the ordered 
retrieves, as might be expected. The differential line 
depicts the extra time necessary for the ordering, which 
increases as the relation size increases. 

Figure 4.8 shews the cost of performing the ordering 
or. the backend versus the host. In this case batch runs on 
the host ar* used to perform the queries. In general, the 
batch retrieves show a marked improvement in response time 
for identical queries over the run-stream queries used in 
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Figure 4.7 Effects of Ordering on the Response Time 



2-megabyte cache wtth accelerator 



O 

o 




<D 

N 

</> 

C 

o 



d) 

L. 



t 

>1 

4 ' 



(Das) ami; esuodsej 



41 



Figure 4.8 A Comparison of Host's and Backend's Ordering Capabili 



Figure 4.7. This may be due tc the decreased overhead cose 
for batch versus an interactive environment. Figure 4.8 
also shews that for smaller-size relations the backend 
performs a more efficient ordering than the host dees. Even 
for larger relations the sort time of the host and the sor + 
time cf the backend are comparable. 





Finally, Figure 4.9 shows 


the effect cf 


randomizing 


the order of the tuples in the rela 


t ion . 


Using 


the randem- 


number 


attribute to scatter the 


tuples 


in the 


r ela-ion , 


similar 


retrieves are performed on 


the ordered 


and random- 



ized relations. In this case there is a non-cluster ed index 
on the KEY attribute for the relations. The graph shews 
miner variances in response times between the two, clearly 
indicating that the order in which the tuples are stored is 
net a significant factor in response time for the ordered 
retrieves. 

E. CONCLUSIONS 

The response times are generally linear, increasing as 
the amount of data tc be returned is increasing. The amount 
cf data may be varied as the number of tuples in a relation 
or the width of the tuples. 

The creation of indicies on tuples shows significant 
improvement in response times when the retrieve command is 
qualified on the indexed attributes. The ir.dicies provide 
marked improvement as the tuple size increases. 

The effects of data compression shows some interesting 
results. Figure 4.6 has shewn a very large improvement for 
compressed tuples. This improvement is most likely attribu- 
table tc the decrease in the number cf disk blocks accessed. 
In fact, the difference in time is proportional to the 
decrease in the number of blocks used for the tuples. 
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Figure 4.9 Effects of Ordering on Ordered and Random Relations. 



Finally, the ordering test shows that the backer.! car. 
sort tuples at least as fast as the host can. Naturally, 
the major portion of the time is spent in transferring the 

the backend; but, 
efficient for -he 

smaller size or relations. 



data from the disk to either 
nevertheless, the tack sr. d 



the host o: 
proves more 
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V. PERFORMANCE E VALUAT ION OF PROJECTION OPERATION 

A. DEFINITION OF A PROJECTION 

Projection is a means to restrict the amount and to 
order the sequence of information returned to the user in a 
retrieval operation. More specifically, projection will 
restrict the attribute values that will be returned from 
each tuple selected. Projection and selection can be 
combined to limit the range of values returned. In addi- 
tion, a user can rearrange the ordering of the attribute 
values as the relation is displayed by varying rhe order of 
the attribute names in the target list. This is not say 
than the actual order of the stored relation is altered but 
that the subset displayed to the user is ordered according 
to his specifications. 

B. PROJECTIONS IN TEE QUERY LANGUAGE 

In F CL the user is given considerable latitude to 
describe precisely which attribute values that he wants to 
be returned. Using the 100-byte relation described in Table 
2.1 as a format for a relation A, the RQL command: 

RETRIEVE ( A. KEY, A. MIRROR ) 

will return to the user only those attribute values in the 



relation 


A who 


S 8 


attribute 


names are KEY 


and 


MIRROR. 


The 


user can 


list 


as 


many att 


ribute names as 


ha 


desires 


and 


place them in 


any 


order in 


the target list 


of 


the RETRIEVE 


command. 


In 




case w her 


e all attribute 


va iu 


es cf a 


rala- 


hi on are 


to be 


listed, the 


user may simply 


us- 


A. ALL. 


All 
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attribute values, i.s., entire tuples, will be re- nr red ir. 
order as they are stored. The user can also add qualifiers 
to restrict the number of tuples returned. These qualifiers 
need not te on the attributes listed. For example, 

RETRIEVE ( A. KEY, A. MIRROR ) WHERE A.?5 = "RED" 

will again return to the user only those attribute values in 
the relation A whose attribute names are KEY and MIRROR. In 
addition, the qualifier will restrict the tuples returned to 
those whose ?5 attribute value is RED. This RETRIEVE 
command also illustrates the means to perform a percentage 
selection. The P5 attribute values are colors selected from 
an enumerated set. Each different color value in the ?5 
attribute is present in 5% of the tuples in the A relation. 
Using these known percentages, the P5 qualification will 
select exactly 5?? of the tuples in relation A. 

C. AN ENVIRONMENT FOR THE MEASUREMENTS 

The projection measurements discussed here are all on 
the same system configuration with 2-megaby-e cache memory 
and the optional accelerator. Lack of time has prevented us 
from obtaining measurements on ether configurations. 

The projection measurements are conducted for four tuple 
sizes, i.e. 100-byte, 200-byte, 1000-byte, and 2000-byte, 
in three percentages of returns, 25%, 50%, and 75%. These 
percentages refer to the number of attribute values in th Q 
tuple that is returned. With the exception of the 100-byte 
tuple size, these are exact percentages; in the 100-byte 
case, the number of attributes returned was 29% and 71%. 
This is due to the tuples in the 100- byte relation having 14 
attributes. A strict percentage of 25% and 75% was not 



46 



attainable. Nevertheless, they are still referred + c as 25? 
and 75% projections. Further, the retrieval commands are 
qualified by 5% and 10% selections in order to reduce 
further the amount cf data to be returned. Each query is 
executed 10 times, each time with a different qualification. 
This is dene to eliminate any effects due to ~h a location of 
the data in the relation and provides a better average 
response time. 

D. PBO JECTICN MEASUBEHENTS 

The test queries used are qualified on th» P5 and ?10 
fields cf the relation to perform the aforementioned selec- 
tion. Each query is then repeated 10 times with a differer.- 
gualifier. The figures represent the average response time 
for those ten tests. Each graph shows the response time in 
seconds plotted against the number of tuples in the 
relation . 

1 • Per cen t age of Proi e ct i ons on No r-K= v Attr ibutes 

In general the difference in response times for the 
five-percent and ten-percent selections is negligible, this 
is particularly true for the smaller-size relations. 
Doubling the number cf tuples returned in a query can result 
in approximately a 20% increase (i.e., 1/3 second increase 

in the response time cn the average) in the smaller tuples 
and a 10% increase (i.e., 7 seconds on the average) in the 

larger tuples. Figures 5.1 and 5.2 show the results of a 
25% projection over varying tuple widths, with Figure 5.1 
for a 5% selection and Figure 5.2 for a 10% selection. As 
can be seen, the graphs in these two figures are nearly 
identical. This is also the case for the graphs on rhe 50% 
and 75% projections. For example, in Figures 5.3 and 5.4, 
similar graphs for the 5% selection with 50% and 75% 
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Figure 5.1 25% Projections on 5* Selections 
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Figure 5.2 25% Projections on 10% Selections 
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Figure 5.3 50X Projections on 5 % Selections 
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Figure 5*4 75$ Projections on 5$ Selections 



projections are displayed respectively . In each graph of 
the aforementioned figures response times increase almost 
linearly as the relation size increases# and increase 
dramatically as the number of attribute values returned 
increases. 

Figures 5.5 and 5.6 give a different perspective on 
the same data. In this case the time for differing projec- 
tion sizes is graphed over a constant tuple width. As 
expected, the greater the number of attribute values 
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projection measures, each such query is repeated 10 times. 
The 5* selections are done on the ?5 field and a different 
value is used in the qualifier for each of the 10 queries. 

As would be expected each figure shows a marked 

difference in the response time as the number of attribute 
values returned is increased. The smaller- width tuples in 
Figures 5.7 and 5.8 show a nearly linear increase in the 

response time as the relation size (the number cf tuples of 
the same tuple width) increases, and an increase in the 

sio p e of the line as the projection size increases. In 

Figure 5.7 the response time for full select is strictly 
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Figure 5.5 Effects cf Differing Projection Percentages for 200-byte Tuples. 
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Figure 5.6 Effects cf Differing Projectin Percentages for 1000-byte Tuples 



58 select ton 2-megabyte cache wtth accelerator 




CD 

N 



<J) 



C 

o 



Q) 

C 



( Dss) auiji asuodsaj 



55 



Figure 5.7 Comparison of Projection and Selection for 100-byte Tuples 
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Figure 5.8 Comparison of Projection and Selection for 200-byte Tuples. 
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Figure 5.9 Comparison of Projection and Selection for 1000-byte Tuples 
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igure 5.10 Comparison of Projection and Selection for 2000-byte Tup 



smaller than any projection time, which indicates that for 
the smaller tuples the backend does a strict selection prior 
to extracting the attribute values specified in the projec- 
tion qualifier. As the tuple width increases, the full 
select may take more time than that of the projection. For 
the 200-byte tuple in Figure 5.8, the full select time is 
again nearly linear, and the times are slightly mere than 
the times for a 25^ projection. The difference in response 
between the full select and the 253 projection steadily 
increases as the relation size increases, but e v - n so th<= 
full select is faster than the 5055 and 755? projections. 

For much-bigger-wid th tuples. Figures 5.9 and 5.10 
show that the full select time is higher than the projection 
time for the small percentage projections. The full select, 
however, has a much smaller slope, thereby crossing the line 
of the projection time and eventually showing a trend of 
quicker response as the relation size increases. Also of 
particular note is the uniformity of the curves for the 
varying projections in the 1000-byte and 2000-byte tuples in 
Figures 5.9 and 5.10. In contrast, for the smaller tuples 
the lines are nearly linear with increasing slopes. The 
lines for the larger tuples are not linear and the slopes 
are very even. 

E. C0NCI0SI0NS 

In general, the projection results are very predictable 
in that the response time is nearly linear and the response 
time increases as the amount of data returned increases. 
The amount cf data may be determined by either the relation 
size cr the projection size. 

The full select comparisons in Figures 5.7, 5.8, 5.9, 

and 5.10, on the other hand, shew seme unanticipated 
results. Instead of showing a clear advantage in the 



59 



response time 


for f 


ull 


S 9 


Is 


ct in 


a 


11 


relation 


s i ! 


29 S 


9 


a s 


might 


be -x ; 


fee 


fed, 


the 


res 


ul 


ts vary 


wi 


th the 


tU 


pl<= 


wi 




hs • 


In the 


s mal' 


Ler 


tupl 


e wi 


ath 


as dep 


ict 


ed 


in Fig 


ur 


e 5, 


.7, 




the 


full s 


elect 


a? 


pears 


tc r 


un 


fa 


ster 


8 V 


en 


though 


t h 


€ amount 


of 


data 


returned 


is g 


reat 


er . 




For 


t 


he 


200-byt 


9 


tUj 


ol e 


C 


as 


depict 


ed in 


Fi 


gure 5 


.8, 


th 


e 


r elat 


ion 


shi 


p is ma 


rk 


p dl 1 


i a 


if 


19 — 


rent . 


For 


th 


e larg 


er * 


upl 


es 


as 


gra 


phed in Fi 


gu 


r es 


5. 


9 


ana 


% 

o 

• 

in 


the 


fu 


11 sel 


6Ct 


reg 


ui 


r es 


aor 


e t 


ime for 




-he 


s m 






relations. 


N 


e verth 


e les 


s. 


2, 


ts advan 


tag 


e becom 


9 S 


ev: 


Lde 


T> — 


as 


the r e 


lation s 


ize in 


crea 


se s 


• 


In 


su 


mm ary, th 


cl: 


fui: 


L-s 


Pi 


act 


operation is s 


ensiti 


V 6 t 


o t 


he 


widt 


h o 


f t 


he tupl 


es 


: 


En 


Ct 


her 


words , 


the 


grea ter 


is 


the 


±. 


uple 


via 


-KV, • 

v 


the 


hi 


ghei 


: i 


3 


^he 


select 


time . 


• 


The fu 


11-s 


ele 


ct 


oper 


ation 


is also 


3 


ans: 


Lt i 


ve 


to 


the size cf t 


he re 


lati 


ons 


9 


alt houg 


h 


in an 


cp 


pos: 


ite 


w 


*y- 


That i 


s. 


the 


large 


r is 


th 


e 


rela 


tio 


n; 


the sm 


ai 


ler 


n 


s 


the 


select 


time 


in 


propo 


rtio 


n t 


c 


the p 


ro j 


act 


ion tim 


9 . 











It is difficult tc determine what effec" ■‘■he cache and 
accelerator with other configurations may play in fhes c 
tests. J need exists for more research in this area to 
verify the figures and collect more data over a wider range 
cf tuple widths and relation sizes in hopes of obtaining a 
clearer trend to the relationship of the full select and the 
projections as the widths and sizes varies. 



60 



71. CONCLUDING REMARKS 

A. C7ERALL OBSERVATIONS OF THE MACHINE PERFORMANCE 

The experiments described in Chapters IV and V show seme 
predictable results as well as some unexpected surprises. 
Generally t he simple select operations, with or without 
indicies, display expected trends. The response rime 
increases as the amount of data to be returned ■‘•o the host 
increases, as shown in Figures 4.1 and 4.5. A similar trend 
is seen for relations with compressed attribute values. As 
Figure 4.6 illustrates, reduction in the response time can 
be significant for the large tuple widths where the degree 
of compression is high. The relations with indicies also 
show expected improvements in the response time for 
retrieves gualified cn these attribute values. 

Seme unexpected results, however, are seen for the test 
results dealing with ordered retrieves. Figure 4.8. The 
backend shows an unexpected superiority in sorting over the 
host for smaller -size relations. Even for the large rela- 
tions, up to 10000 tuples, the backend maintains a response 



time comparable 


with the 


host. One would exp 


act that 


the 


mainframe would 


have a 


significant advantage 


in ccmput 


ing 


powe r ana show 


a major 


improvement when the 


re la tion 


is 



ordered in the host instead of in the backend. 

Another interesting result is the effect of clustered 
and ton-clustered indicies on ordered retrieves. Creating 
a clustered index on a relation will cause the tuples to be 
stored in a specific order while a non-clust ered index does 
not imply any ordering of the tuples. Figure 4.3 shows very 
similar response times throughout the range of relation 
sizes, regardless cf whether the index is clustered or 
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B. DATABASE AND MACHINE LIMITATIONS 

When considering the test environment, two specific 
limitations stand above all else. The first of these is the 
low resolution of the clock from which measurements are 
taken. The standardized use of the GETTIM 3 function 
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throughout -‘•he tes*s has made comparison of various test 
results over differing periods meaningful. Even so, the low 
resolution makes the need for average times over many 
similar test runs a necessity. This greatly limits the 
amount of time that one can spend in running more meaningful 
tests and in verifying previous results. A great effort has 
been mads to find some other timing mechanism. In the end, 
GETTIME proves to be the easiest to use, the most 

consistent, and, most importantly, the easiest to control. 

The second limitation concerns the system configuration 
and the inability to control the environment of both the 
host and the backend. The performance of these tests has 
not been a very high priority of the parent command at Pt. 
Mugu. This is to be expected, since the host machine is in 
a production environment. Gaining exclusive use is very 
difficult and extremely costly. with this restriction, our 
tests are limited to weekend and evening runs, at times of 
relatively low activity. This significantly reduced the 
time of system availability. Also, in terms of the environ- 
ment, the backend system we used is a relatively new piece 
of equipment. Lastly, the sytem configuration has been 
changing frequently during the experimentation period. The 
time each configuration becomes available has been short. 
Consequently, not enough data can be collected tc make any 
significant comparisons. 

C. RECOMMENDATIONS ECR FUTURE BENCHMARKING EFFORTS 

In light of the test results discussed here, the direc- 
tion of future work should be toward effects of various 
indicies and ordering capabilities. The results of tests on 
various types of indicies and the ordering of relations show 
the most startling results. In addition, some work is 
required over a wider range of tuple widths tc refine 
previous results. 
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Another aspect that warrants rssaarch is a mix of tests 
to simulate a more realistic system load, specifically rests 
with multiple users cf the backend and a more realistic host 
workload. The tests in this thesis are runs on an unloaded 
system. In actuality, the use of the system will most 
likely occur closer to peak loading. Perhaps different 
trends may develop when the host and/or backend are 
subjected tc different load conditions. 

Sven though these tests are cn a specific system, they 
are general enough in nature to provide insight for tests on 
other relational machines and to aid in making a comparison 
cf different backends. 
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